Data processor

ABSTRACT

An instruction set is provided which has a first field for describing an execution instruction for designating content of an operation or data processing that is executed in at least one processing unit forming a data processing system, and a second field for describing preparation information for setting the processing unit to such a state that is ready to execute an operation or data processing that is executed according to the execution instruction, thereby making it possible to provide a control program having the instruction set in which preparation information independent of the execution instruction described in the first field is described in the second field. Accordingly preparation for execution of the subsequent execution instruction is made based on the preparation information. In the instruction set, since destination of branch instruction is described in the second field and is known in advance, the problems that cannot be solved with a conventional instruction set can be solved.

TECHNICAL FIELD

The present invention relates to a control program product describedwith microcodes or the like, and a data processing system capable ofexecuting the control program.

BACKGROUND OF INVENTION

Processors (data processing systems or LSIs) incorporating an operationfunction such as microprocessor (MPU) and digital signal processor (DSP)are known as apparatuses for conducting general-purpose processing andspecial digital data processing. Architectural factors that havesignificantly contributed to improved performance of these processorsinclude pipelining technology, super-pipelining technology, super-scalartechnology, VLIW technology, and addition of specialized data paths(special purpose instructions). The architectural elements furtherinclude branch prediction, register bank, cache technology, and thelike.

There is a clear difference in performance between non-pipeline andpipeline. Basically, with the same instruction, the number of pipelinestages reliably improves throughput. For example, the four-stagepipeline can be expected to achieve at least fourfold increase inthroughput, and the eight-stage pipeline will achieve eightfold increasein throughput, which means that the super-pipeline technologyadditionally improves the performance twice or more. Since the progressin process enables segmentation of the critical paths, an upper limit ofan operating frequency will be significantly improved and thecontribution of the pipeline technology will be further increased.However, the delay or penalty of a branch instruction has not beeneliminated, and whether a super-pipeline machine will succeed or notdepends on how much a multi-stage delay corresponding to the memoryaccesses and branches can be handled with instruction scheduling by acompiler.

The super-scalar technology is the technology of simultaneouslyexecuting instructions near a program counter with sophisticatedinternal data paths. Also supported by the progress in compileroptimization technology, this technology has become capable of executingabout four to eight instructions simultaneously. In many cases, however,the instruction itself frequently uses the most recent operation resultand/or result in a register. Aside from the peak performance, thisnecessarily reduces the average number of instructions that can beexecuted simultaneously to a value much smaller than that describedabove even by making full use of various techniques such as forwarding,instruction relocation, out-of-order and register renaming. Inparticular, since it is impossible to execute a plurality of conditionalbranch instructions or the like, the effects of the super-scalartechnology are further reduced. Accordingly, the degree of contributionto improved performance of the processor would be on the order of about2.0 to 2.5 times on the average. Should an extremely well compatibleapplication exist, a practical degree of contribution would be on theorder of four times or less.

The VLIW technology comes up as the next technology. According to thistechnology, the data paths are configured in advance so as to allow forparallel execution, optimization is conducted so that a compilerimproves the parallel execution and generate a proper VLIW instructioncode. This technology adopts an extremely rational idea, eliminating theneed for the circuitry for checking the likelihood of parallel executionof individual instructions as in the super-scalar. Therefore, thistechnology is considered to be extremely promising as means forrealizing the hardware for parallel execution. However, this technologyis also incapable of executing a plurality of conditional branchinstructions. Therefore, a practical degree of contribution toperformance would be on the order of about 3.5 to 5 times. In addition,given a processor for use in processing of an application that requiresimage processing or special data processing, the VLIW is not an optimalsolution either. This is because, particularly in applications requiringcontinuous or sequential processing using the operation results, thereis a limit in executing operations or data processing while holding thedata in a general-purpose register as in VLIW. This problem is the samein the conventional pipeline technology.

On the other hand, it is well known from the past experiences thatvarious matrix calculations, vector calculations and the like areconducted with higher performance when implemented in dedicatedcircuitry. Therefore, in the most advanced technology for achieving thehighest performance, the idea based on the VLIW becomes major with thevarious dedicated arithmetic circuits mounted according to the purposeof applications.

However, the VLIW is the technology of improving the parallel-processingexecution efficiency near a program counter. Therefore, the VLIW is notso effective in, e.g., executing two or more objects simultaneously orexecuting two or more functions. Moreover, mounting various dedicatedarithmetic circuits increases the hardware, also reduces softwareflexibility. Furthermore, it is essentially difficult to solve thepenalty occurs in executing conditional branching.

It is therefore an object of the present invention to study the problemsfrom a different standpoint of these conventional technologies forincreasing the processor speed, and to provide a new solution. Morespecifically, it is an object of the present invention to provide asystem, i.e., a control program product, capable of improving thethroughput like pipeline while solving the penalty in executing theconditional branching, a data processing system capable of executing thecontrol program, and its control method. It is another object of thepresent invention to provide a control program product capable offlexibly executing individual data processing, even if they arecomplicated data processing, at a high speed without having to use awide variety of dedicated circuits specific to the respective dataprocessing. Also, providing a data processing system capable ofexecuting the program, and its a control method are one of the object ofthis invention.

SUMMARY OF THE INVENTION

The inventor of the present application found that the problems asdescribed above are caused by the limitations of the instruction set forthe conventional non-pipeline technology being the base of thetechnologies above. More specifically, the instruction set (instructionformat) of a program (microcodes, assembly codes, machine languages, orthe like) defining the data processing in a processor is a mnemonic codeformed from combination of an instruction operation (executioninstruction) and an operand defining environment or interface ofregisters to be used in executing that instruction. Accordingly, thewhole aspect of the processing designated by the instruction set iscompletely understood when looking the conventional instruction set,contrary any aspect of the instruction set cannot be known at all untilthe instruction set appears and being decoded. The present inventionsignificantly changes structure of instruction-set itself, therebysuccessively solving the aforementioned problems that are hard toaddress with the prior art, and enabling significant improvement inperformance of the data processing system.

In the present invention, an instruction set including a first field fordescribing (recording) an execution instruction for designating contentof an operation or data processing that is executed in at least oneprocessing unit forming a data processing system, and a second field fordescribing (recording) preparation information for setting theprocessing unit to such a state that is ready to execute an operation ordata processing that is executed according to the execution instruction,is provided so that the preparation information for the operation ordata processing that is independent of the content of the executioninstruction described in the first field in the instruction set isdescribed in the second field. Thus, the present invention provides acontrol program product or control program apparatus comprising theabove instruction set. This control program can be provided in the formrecorded or stored on an appropriate recording medium readable with adata processing system, or in the form embedded in a transmission mediumtransmitted over a computer network or another communication.

The processing unit is an appropriate unit for forming the dataprocessing system and into which the data processing system can bedivided in terms of functionality or data path, and the unit includes acontrol unit, an arithmetic unit, and a processing unit or data flowprocessing unit having a somewhat compact data path being capable ofhandles as a template or the like having a specific data path.

A data processing system according to the present invention comprises:at least one processing unit for executing an operation or dataprocessing; a unit for fetching an instruction set including a firstfield for describing an execution instruction for designating content ofthe operation or data processing that is executed in the processingunit, and a second field for describing preparation information forsetting the processing unit to a state that is ready to execute theoperation or data processing that is executed according to the executioninstruction; a first execution control unit for decoding the executioninstruction in the first field and proceeding with the operation or dataprocessing by the processing unit that is preset so as to be ready toexecute the operation or data processing of the execution instruction;and a second execution control unit for decoding the preparationinformation in the second field and, independently of content of theproceeding of the first execution control unit, setting a state of theprocessing unit so as to be ready to execute another operation or dataprocessing.

A method for controlling a data processing system including at least oneprocessing unit for executing an operation or data processing accordingto the present invention includes: a step of fetching the instructionset including the aforementioned first and second fields; a firstcontrol step of decoding the execution instruction in the first fieldand proceeding with the operation or data processing by the processingunit that is preset so as to be ready to execute the operation or dataprocessing of the execution instruction; and a second control step ofdecoding, independently of the first control step, the preparationinformation in the second field and setting a state of the processingunit so as to be ready to execute an operation or data processing.

The instruction set according to the present invention has a first fieldfor describing an execution instruction, and a second field fordescribing preparation information (preparation instruction) that isindependent of the execution instruction and includes the informationsuch as register and immediate data. Accordingly, in an arithmeticinstruction, an instruction operation such as “ADD” is described in thefirst field, and an instruction or information specifying registers isdescribed in the second field. It seems be in apparently the sameinstruction set as the conventional assemble code, however, theexecution instruction and the preparation information are independent ofeach other, and therefore are not correspond to each other within thesame instruction set. Therefore, this instruction set has a propertythat a processing to be executed by the processing unit of the dataprocessing system, such as a control unit, cannot be completelyunderstood or being not completely specified by itself. In other words,the instruction set according to the present invention is significantlydifferent from the conventional mnemonic code. In the present invention,the instruction operation and its corresponding operand, which areconventionally described in a single or the same instruction set, areallowed to be defined individually and independently, so that theprocessing that cannot be realized with the conventional instruction setbecomes readily performed.

The preparation information for the execution instruction described inthe first field of a subsequent instruction set is describable in thesecond field. This becomes possible to make preparation for execution ofan execution instruction before an instruction set including thatexecution instruction appears. In other words, it is possible to set theprocessing unit to such a state that is ready to execute an operation ordata processing that is executed according to the execution instructionprior to that execution instruction. For example, it is possible todescribe an instruction for operating at least one arithmetic/logic unitincluded in a control unit of the data processing system in the firstfield of a certain instruction set (instruction format or instructionrecord). And it is possible to describe an instruction or informationfor defining interfaces of the arithmetic/logic unit such as a sourceregister or destination register for the above operation in that atleast one arithmetic/logic unit in the second field of the precedinginstruction set. Thus, before the execution instruction is fetched, theregister information of the arithmetic/logic unit is decoded, and theregisters are set. Then, the logic operation is performed according tothe subsequently fetched execution instruction, and the result thereofis stored in the designated register. It is also possible to describethe destination register in the first field together with the executioninstruction.

Accordingly, with the instruction set of the present invention, the dataprocessing can be conducted in multiple stages like the pipelineprocessing and the throughput is improved. Namely, an instruction “ADD,R0, R1, #1234H” means that a register R1 and data #01234H are addedtogether and the result is stored in a register R0. However, in terms ofthe hardware architecture, it is advantageous for high-speed processingto execute or perform the read process from the register R0 and data“#01234H” to the input registers of the data path to which an arithmeticadder ADD, i.e., arithmetic/logic unit belongs, overlapping with theexecution cycle of the previous instruction set that is one clock beforethe execution cycle of the execution instruction ADD. In this case,purely the arithmetic addition is conducted, AC characteristics(execution frequency characteristics) becomes improved. In theconventional pipeline processing, this problem would be also improved tosome degree when the number of pipeline stages is increased so as toconsume a single stage exclusively for a read cycle from a registerfile. However, in the conventional pipeline processing, the above methodnecessarily increases the delay of output. In contrast, the presentinvention can solve the problem without increasing the delay.

In the instruction set of the present invention, it is possible todescribe the preparation information prior to the execution instruction.Therefore, in a branch instruction such as conditional branchinstruction, branch destination information is provided to the controlunit prior to the execution instruction. Namely, in the conventionalmnemonic code, a human can understand the whole meaning of theinstruction set at a glance, but cannot know it until the instructionset appears. In contrast, in the instruction set of the presentinvention, the whole meaning of the instruction set cannot be understoodat a glance, but information associated with the execution instructionare provided before the execution instruction appears. Thus, since thebranch destination is assigned prior to the execution instruction, it isalso possible to fetch the instruction set at the branch destination,and also to make preparation for the execution instruction at the branchdestination in advance.

In general, most of the current CPUs/DSPs have successively increasedthe processing speed by shifting the pipeline processing to a laterstage (later in the time base). However, problems come to the surfaceupon execution of branch and CALL/RET of program. More specifically,since the fetch address information has not been obtained in advance,the above problems are essentially causes penalty that cannot be solvedin principle. Of course, branch prediction, delayed branch, high-speedbranch buffer, or high-speed loop handling technology employed in DSPhave succeeded in significantly reducing such penalty. However, theproblems come to the surface again when a number of successive branchesoccur, and therefore it is a well-known fact that those technologiesprovide no essential solution.

Moreover, in the conventional art, the register information required bythe subsequent instruction cannot be obtained in advance. This increasescomplexity of forwarding processing or bypass processing for increasingthe pipeline processing speed. Therefore, increasing the processingspeed by the prior art cause a significant increase in hardware costs.

As described above, in the conventional instruction set, the addressinformation of the branch destination is obtained only after decodingthe instruction set, making it difficult to essentially solve thepenalty produced upon execution of conditional branching. In contrast,in the instruction set of the present invention, since the branchdestination information is obtained in advance, the penalty producedupon execution of conditional branching is eliminated. Moreover, if thehardware has enough capacity or scale, it is also possible to fetch thepreparation instruction at the branch destination so as to makepreparation for the subsequent execution instruction after the branch.If the branch condition is not satisfied, only the preparation iswasted, causing no penalty of the execution time.

Moreover, since the register information required by the subsequentinstruction is known simultaneously with or prior to the instructionexecution, the processing speed can be increased without increasing thehardware costs. In the present invention, a part of the processing stageconventionally conducted on the hardware in the conventional pipelineprocessing is successfully implemented on the software processing inadvance during compiling or assembling stage.

In the data processing system of the present invention, the secondexecution control unit for processing based on the preparationinformation may be a unit that is capable of dynamically controllingchangeable architecture by connection between transistors, such as FPGA(Field Programmable Gate Arrays). However, it consumes much time todynamically change the hardware like the FPGA, and an additionalhardware is required for reducing that time for reconfiguration. It isalso possible to store the reconfiguration information of the FPGA inRAM having two faces or more and the reconfiguration is executed in thebackground so as to dynamically change the architecture in an apparentlyshort time. However, in order to enable the reconfiguration to beconducted within several clocks, it is required to mount a RAM and storeall of a possible number of combinations of reconstruction information.This does not at all essentially solve the economical problem of a longreconfiguration time of the FPGA. Moreover, due to the architecture ofFPGA for enabling efficient mapping basing on the gate like hardware,the poor AC characteristics of the FPGA at the practical level, theoriginal problem of the FPGA, is not likely to be solved for the timebeing.

In contrast, in the present invention, an input and/or output interfaceof the processing unit is separately defined as preparation informationindependently of the time of the execution (execution timing) of theprocessing unit. Thus, in the second execution unit or the secondcontrol step, the input and/or output interface of the processing unitcan be separately set independently of the execution timing of theprocessing unit. Accordingly, in the data processing system having aplurality of processing units, by the second execution control unit orthe second control step, combination of data paths by these processingunits can be controlled independently of the execution. Therefore, aninstruction defining an interface of at least one processing unit suchas arithmetic/logic unit included in the data processing system recordedor described in the second field becomes data flow designation. Thisenables improvement in independence of the data path. As a result, thedata flow designation is performed while executing another instructionprogram. Also, an architecture that an internal data path of the controlunit or data processing system in the idle state allows to be lent for amore urgent process being performed in another external control unit ordata processing system is provided.

Moreover, information also defining content of processing and/or circuitconfiguration of the processing unit are included in the preparationinformation. Therefore, the second execution control unit or the secondcontrol step designates the processing content (circuit configuration)of the processing unit. Thus, the data path can be configured moreflexibly.

Furthermore, the second execution control unit or the second controlstep has a function as a scheduler for managing combination of datapaths such as defining the interface of the arithmetic/logic unit fordecoding the register information for fetching and the interface ofanother processing unit in order to handle a wide variety of dataprocessing. For example, in the case where matrix calculation process isperformed for a fixed time and filtering process is preformedthereafter, connection between the processing units within the dataprocessing system for these processes are provided prior to the eachprocess, and the each process is performed sequentially by the timecounter. Replacing the time counter with another comparison circuit orexternal event detector enables more complicated and flexible schedulingbecomes possible.

The FPGA architecture may be employed in individual processing units.However, it takes a long time to dynamically change the hardware, andadditional hardware for reducing that time is required. This makes itdifficult to dynamically control the hardware within the processing unitduring execution of the application. Should a plurality of RAM beprovided with a bank structure for instantaneous switching, switching onthe order of several to several tens of clocks would require aconsiderable number of bank structures. Thus, it is basically requiredto make each of the macro cells within the FPGA independentlyprogrammable and detectable the time or timing for changing as aprogram-based control machine. However, the current FPGA is not enoughto deal with such a structure. Should the FPGA be capable of deal withthat structure, new instruction control architecture as in the presentinvention is required for controlling the timing dynamically.

Accordingly, in the present invention, it is desirable to employ as theprocessing unit a circuit unit including a specific internal data path.By the processing units having somewhat compact data paths prepared astemplates and combination of the data paths of the templates, thedata-flow-type processing is designated and performed. In addition, apart of the internal data path of the processing unit becomes selectableaccording to the preparation information or preparation instruction, theprocessing content of the processing unit becomes changeable. As aresult, the hardware can be more flexibly reconfigured in a short time.

A processing unit provided with an appropriate logic gate or logic gatesand internal data paths connecting the logic gate or gates withinput/output interfaces is hereinafter referred to as a template sincethe specific data path provided in that processing unit is used like atemplate. Namely, in the processing unit, it becomes possible to changethe process of the processing unit by changing the order of data to beinput/output to the logic gates or changing connection between orselection of the logic gates. It is only necessary to select a part ofthe internal data path that is prepared in advance. Therefore, theprocessing can be changed in a shorter time as compared to the FPGA thatrequires change of the circuitry at the transistor level. Moreover, theuse of the previously arranged internal data path for the specificpurpose reduces the number of redundant circuit elements and increasesthe area utilization efficiency of the transistors. Accordingly, themounting density becomes high, which leads economical production.Moreover, arranging the data path suitable for high-speed processing, anexcellent AC characteristic is obtained. Therefore, in the presentinvention, it is desirable that in the second execution control unit andthe second control step, at least a part of the internal data path ofthe processing unit becomes selectable according to the preparationinformation.

It is also desirable that the second execution control unit has afunction as a scheduler for managing an interface of the processing unitso as to manage a schedule retaining the interface of each processingunit that is set based on the preparation information.

Moreover, it is desirable that input and/or output interfaces in aprocessing block formed from a plurality of processing units aredesignated according to the preparation information. Since theinterfaces of the plurality of processing units are changed with asingle instruction, data paths associated with the plurality ofprocessing units are changed with a single instruction. Accordingly, itis desirable that in the second execution control unit or step, inputand/or output interfaces of the processing units are changeable in theunit of the processing block according to the preparation information.

Moreover, it is desirable to provide a memory storing a plurality ofconfiguration data defining the input and/or output interfaces in theprocessing block, and to enable the input and/or output interfaces inthe processing block to be changed by selecting one of the plurality ofconfiguration data stored in the memory according to the preparationinformation. When the configuration data is designated with a data flowdefining instruction, changing of the interfaces of the plurality ofprocessing units are controlled from a program without using theredundant instruction.

Furthermore, the data processing system having a first control unitsuitable for general-purpose processing, such as the arithmetic/logicunit, as a processing unit, and a second control unit suitable forspecial processing such as a plurality of data flow processing unitshaving a specific internal data path, becomes a system LSI that issuitable for processing requiring high-speed performance and real-timeperformance like network processing and image processing. In theinstruction set of the present invention, the execution instruction foroperating the arithmetic/logic unit is described in the first field, andthe preparation information defining an interface of thearithmetic/logic unit and/or the data flow processing units is describedin the second field. Therefore, by the instruction set of the presentinvention, the program product suitable for controlling theaforementioned system LSI is provided.

Conventionally, the only way to handle with complicated data processingis to prepare dedicated circuitry and implement a dedicated instructionusing that circuitry, thereby increasing the hardware costs. Incontrast, in the instruction set of the present invention, the interfaceof the arithmetic/logic unit and the contents of processings to beexecuted are described in the second field independently of theexecution instruction, thereby making it possible to include thecomposition for controlling pipelines and/or controlling data paths intothe instruction set. Accordingly, the present invention provides meansthat is effective in execution of parallel processing near a programcounter, but also in para-simultaneous execution of two or more objectsand para-simultaneous execution of two or more functions. In otherwords, data processes and/or algorithm having different contexts are notperformed simultaneously in the conventional instruction since it isrequired to simultaneous processing according to remote program counterspointing far beyond points each other. In contrast, by appropriatelydefining data flows with the instruction sets of the present invention,such processes are preformed regardless of the program counters.

Accordingly, with the instruction sets of the present invention, whenthe data paths are effective in improvement in parallel processingperformance from the application side previously, such data paths areconfigured or arranged previously using the second field by thesoftware. Then, the data paths (data flows) implemented are activated orexecuted using the instruction level as required by the software. Thedata paths are applied not only for the data processing corresponding tosome specific purposes but also for a purpose for activating statemachines, therefore, the applications of the data paths are extremelyfree.

Moreover, the information in the second field allows a preparation cyclefor the following instruction to be readily generated in advance.Conventionally, an operation must be performed using registers. However,buffering by the preparation cycle makes it possible to use memories(single port/dual port) or register files instead of the register. Inthe second field of the instruction set, the instructions designatinginput/output between registers or between buffers and memories that areincluded in the processing unit can be described. Therefore, when theinput/output between the registers or between buffer and the memoriesare controlled in the second execution control unit or the secondcontrol step, the input/output or to/from the memories are performedindependently of the execution instruction.

This enhances relevance between individual instruction sequences, andcontributes to avoiding hardware resource contention prior to theexecution, thereby making it possible to quick correspondence to theparallel simultaneous execution requirements of a plurality ofinstructions and/or external interrupt requirements. In addition, sincethe memory can basically be regarded as a register, high-speed taskswitching can be implemented. It is also possible to employ apreloading-type high-speed buffer instead of a cache memory that cannoteliminate conventional first-fetch penalty. Therefore, a high-speedembedded system producing no penalty while ensuring a 100% hit ratio canalso be implemented.

In other words, by allowing the memory to be regarded as a register, aplurality of asynchronous processing requests such as interrupts can behandled at a high speed, thereby making it possible to deal with thecomplicated data processing and continuous data processing in anextremely flexible manner. Moreover, since it does not take a long timeto store and recover the register, it becomes very easy to deal with thetask switching at a high speed. In addition, since the difference inaccess speed between the external memories and internal memories iscompletely eliminated, the first-fetch penalty problem in the cachememories becomes solved efficiently. Accordingly, CALL/RET andinterrupt/IRET can be processed at a high speed. Thus, environments forresponding to the event configured easily and reduction in dataprocessing performance due to the event can be prevented.

Moreover, in the first or second field, it is possible to describe aplurality of execution instructions or preparation instructions likeVLIW, and it is possible that the first or second execution control unitinclude a plurality of execution control portions for independentlyprocessing the plurality of independent execution instructions orpreparation instructions that are described in the first or second fieldrespectively. Thus, further improved performance can be obtained.

By implementing a data processing system that employs the control unitof the present invention as a core or peripheral circuitry, it ispossible to provide a further economical data processing system havingthe advantages as described above and having a high processing speed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an instruction set of the present invention.

FIG. 2 illustrates in more detail a Y field of the instruction set ofFIG. 1.

FIG. 3 illustrates one example using the instruction set of FIG. 1.

FIG. 4 illustrates how data are stored in a register by the instructionset of FIG. 3.

FIG. 5 illustrates a data processing system for executing theinstruction set of the present invention.

FIG. 6 illustrates a program executed with a conventional CPU or DSP.

FIG. 7 illustrates a program of the data processing system according tothe present invention.

FIG. 8 illustrates compiled program of the program of FIG. 7 usinginstruction sets of the present invention.

FIG. 9 illustrates another program of the data processing systemaccording to the present invention.

FIG. 10 illustrates data flows configured by the program of FIG. 9.

FIG. 11 illustrates another data processing system for executing dataprocesses by the instruction sets of the present invention.

FIG. 12 illustrates how different dedicated circuits are formed withdifferent combinations of templates.

FIG. 13 illustrates one of the templates.

DESCRIPTION OF THE PREFERRED EMBODIMENT

Hereinafter, the present invention will be described in more detail withreference to the drawings. FIG. 1 shows the structure or format of theinstruction set (instruction format) according to the present invention.The instruction set (instruction set of DAP/DNA) 10 in the presentinvention includes two fields: a first field called instructionexecution basic field (X field) 11 and a second field called instructionexecution preparation cycle field (additional field or Y field) 12capable of improving efficiency of the subsequent instruction execution.The instruction execution basic field (X field) 11 specifies a dataoperation such as addition/subtraction, OR operation, AND operation andcomparison, as well as the contents of various other data processingssuch as branching, and designates a location (destination) where theoperation result is to be stored. Moreover, in order to improve theutilization efficiency of the instruction length, the X field 11includes only information of the instructions for execution. On theother hand, the additional field (Y field) 12 is capable of describingan instruction or instructions (information) independent of theexecution instruction in the X field of the same instruction set, andfor example, is assigned for the information for execution preparationcycle of the subsequent instruction.

The instruction set 10 will be described in more detail. The X field 11has an execution instruction field 15 describing the instructionoperation or execution instruction (Execution ID) to a processing unitsuch as arithmetic/logic unit, a field (type field) 16 indicatingvalid/invalid of the Y field 12 and the type of preparation instruction(preparation information) indicated in the Y field 12, and a field 17showing a destination register. As described above, the description ofthe type field 16 is associated with the Y field 12 and can be definedindependently of the descriptions of the other fields in the X field 11.

In the Y field 12, the preparation information defined by the type field16 is described. The preparation information described in the Y field 12are information for making an operation or other data processing readyfor execution. Some specific examples thereof are shown in FIG. 2.First, it is noted again that the TYPE field 16 in the X field 11 is fordescribing information independently or regardless of the information inthe execution instruction field 15. In the Y field 12, it is possible todescribe an address information field 26 that describes an address ID(AID) 21 and address information 22 which intended use is defined by AID21, e.g., an address (ADRS) and an input/output address (ADRS.FROM/TO).This address information described in the Y field 12 is used for readingand writing between registers or buffers and memories (includingregister files), and block transferring like DMA becomes ready by theinformation in the Y field. In addition to the input/output address(R/W), it is also possible to describe the information such as anaddress indicating a branch destination upon execution of a branchinstruction (fetch address, F) and a start address (D) upon parallelexecution in the Y field 12 as address information.

In the Y field, it is also possible to describe information 23 thatdefines an instruction of a register type, e.g., defined immediate (imm)and/or information of registers (Reg) serving as source registers forthe arithmetic operation or another logic operation instruction(including MOVE, memory read/write, and the like). In other words, it ispossible to use the Y field 12 as a field 27 that defines sources forthe subsequent execution instruction.

Furthermore, in the Y field 12, it is possible to describe information25 defines interfaces (source, destination) and processing content orfunction and/or their combination of an arithmetic/logic unit (ALU) orother data processing unit, e.g., a template having data path(s) beingready to use. In other words, the Y field 12 is utilized as a field 28for describing data flow designation instructions 25 for definingreconfigure data paths to be pipelines (data flows or data paths) forconducting a specific data processing. It is also possible to describeinformation for starting or executing the data flow and information forterminating the same in the Y field 12. Accordingly, the data flowsprovided with reconfigurable data paths defined by the Y field 12enables execution of processes independently of a program counter forfetching a code from a code RAM.

It should be understood that the format of the instruction set as shownin FIGS. 1 and 2 is only one of examples of instruction set having twoindependent instruction fields according to the present invention, andthe present invention is not limited to the format shown in FIGS. 1 and2. For example, the positions of the some fields in the X and Y fieldsare not limited. The position of the independent field, e.g., type field16 may alternatively be located at the head of the Y field 12. It isalso possible to change the order of the X field 11 and Y field 12. Inthis example, since the information of the Y field 12 is included in theX field 11, whether or not preparation information is present in the Yfield 12 as well as the type of the preparation information are judgedwhen the X field 11 for describing the execution instruction is decoded.

In the example described below, the execution instruction andpreparation instruction are described in the X field 11 and Y field 12respectively. However, by the instruction format, it is possible toprovide an instruction set that no instruction is described (NOP isdescribed) in the X or Y fields and only the X field 11 or Y field 12 iseffective actually. Another instruction set is also possible by theabove instruction format that such a preparation instruction havingoperands such as register information relating to an executioninstruction described in the X field 11, i.e., the preparationinstruction that is not independent of the execution instruction in theX field 11, is simultaneously described in the Y field 12 of the sameinstruction set 10. This instruction set may be included mixedly in thesame programs with the instruction sets of the present invention inwhich the X field 11 and Y field 12 are independent of each other andhave no relation to each other within the same instruction set. Aspecific example is not described below for clarity of description ofthe invention, however, a program product having both the instructionsets 10 in which the respective description in the X field 11 and Yfield 12 are independent of each other and the instruction sets in whichthe respective description in the X field 11 and Y field 12 areassociated with each other, a recording medium recording such a programare also within the scope of the present invention.

FIG. 3 shows an example of the instruction set 10 of this invention. Inthe number j−1 instruction set 10, T(j−1), the type field 16 of the Xfield 11 indicates that 32-bit immediate is described in the Y field 12of the same instruction set. “#00001234H” is recorded as immediate inthe Y field 12 of the instruction set T(j−1). In the following number jinstruction set TO), “MOVE” is described in the execution instructionfield 15 of the X field 11, and register R3 is indicated in thedestination field 17. Accordingly, when this number j instruction setT(j) is fetched, an ALU of a control unit stores, in the register R3,the immediate “#00001234H” defined in the preceding instruction setT(j−1).

Thus, in the instruction set 10 of this embodiment (hereinafter, thenumber j instruction set 10 is referred to as instruction set T(j)),preparation for the execution instruction described in the instructionset T(j)) is made by means of the preceding instruction set T(j−1).Accordingly, the whole of processing to be executed by the ALU of thecontrol unit cannot be known from the instruction set T(j) alone, but isuniquely determined from the two instruction sets T(j−1) and T(j).Moreover, in the execution instruction field 15 of the instruction setT(j−1), another execution instruction for another process prepared bythe Y field 12 of the preceding instruction set is describedindependently of the Y field 12 of the instruction set T(j−1).Furthermore, in the T(j) field 16 and Y field 12 of the instruction setT(j), another preparation information of another execution instructiondescribed in the execution instruction field of the followinginstruction set is described.

In this embodiment, preparation information (preparation instruction) ofthe execution instruction described in the X field 11 of the instructionset T(j) is described in the Y field 12 of the immediately precedinginstruction set T(j−1). In other words, in this example, preparationinstruction latency corresponds to one clock. However, preparationinformation may be described in another instruction set prior to theimmediately preceding instruction set. For example, in a control programof the control unit having a plurality of ALUs, or for data flow controlas described below, the preparation instruction need not be described inthe immediately preceding instruction set. Provided that the state(environment or interface) of ALUs or the configuration of templates setby preparation instructions are held or kept until the instruction sethaving the execution instruction corresponding to that preparationinstruction is fetched for execution, the preparation instruction can bedescribed in the Y field 12 of the instruction set 10 that is preformedseveral instructions cycle before the instruction set 10 having theexecution instruction corresponding to the preparation instruction.

FIG. 4 shows the state where a data item is stored according to theinstruction set of FIG. 3 in a register file or memory that functions asregisters. A processor fetches the number j−1 instruction set T(j−1),and the immediate “#00001234H” is latched in a source register DP0.R ofthe ALU of the processor according to the preparation instruction in theY field 12 thereof. Then, the processor fetches the following number jinstruction set T(j), and the immediate thus latched is stored in abuffer 29 b in the execution cycle of the execution instruction “MOVE”in the X field 11. Thereafter, the data item in the buffer 29 b is savedat the address corresponding to the register R3 of the memory or theregister file 29 a. Even if the storage destination is not registers butmemories, by the instruction set 10 of this embodiment enables the datato be loaded or stored in the execution instruction cycle by conductingthe process according to the preparation information prior to theexecution instruction.

FIG. 5 shows the schematic structure of a processor (data processingsystem) 38 having a control unit 30 capable of executing a programhaving the instruction sets 10 of this embodiment. Microcodes ormicroprograms 18 having the instruction sets 10 of this embodiment aresaved in a code ROM 39. The control unit 30 includes a fetch unit 31 forfetching an instruction set 10 of the microprogram from the code ROM 39according to a program counter whenever necessary, and a first executioncontrol unit 32 having a function to decode the X field 11 of thefetched instruction set 10 so as to determine or assert the function ofthe ALU 34, and to select destination registers 34 d so as to latch thelogic operation result of the ALU 34 therein.

The control unit 30 further includes a second execution control unit 33having a function to decode the Y field 12 of the fetched instructionset 10 based on the information in the type field 16 of the X field 11and to select source registers 34 s of the arithmetic processing unit(ALU) 34. This second execution control unit 33 is capable ofinterpreting the instruction or information in the Y field 12independently of the description of the X field 11, except for theinformation in the type field 16. If the information described in the Yfield 12 defines data flows, the second execution control unit 33further has a function to select or set the source and destination sidesof the ALU 34, i.e., determine the interface of the ALU 34, and toretain that state continuously until a predetermined clock or until acancel instruction is given. Moreover, in the case where the informationin the Y field 12 defines data flows, the second execution control unit33 further determines the function (processing content) of the ALU 34and retains that state for a predetermined period.

Accordingly, the first execution control unit 32 conducts a firstcontrol step of decoding the execution instruction in the X field 11 andproceeding with the operation or other data processes according to thatexecution instruction by the processing unit that is preset so as to beready to execute the operation or other data processes of that executioninstruction. On the other hand, independently of the content of theexecution of the first execution control unit 32 and the first controlstep conducted thereby, the second execution control unit 33 performs asecond control step of decoding preparation information in the Y field12 and setting the state of the processing unit so as to be ready toexecute the operation or other data processing.

This control unit 30 further includes a plurality of combinations ofsuch execution control units 32, 33 and ALUs 34, making it possible toexecute various processes. As a result, a DSP for high-speed image dataprocessing, a general CPU or MPU capable of high-speed digitalprocessing, and the like, can be configured using the control unit 30 asa core or peripheral circuitry.

FIGS. 6 to 9 shows some sample programs executed by the control unit 30of this embodiment. A sample program 41 shown in FIG. 6 is an examplecreated so as to be executable by a conventional CPU or DSP. Thisprogram extracts the maximum value from a table starting with an address#START and is terminated upon detection of #END indicating the lastdata.

A program 42 shown in FIG. 7 corresponds to the same procedure as thatof FIG. 6, the program is converted to the one suitable for the controlunit 30 for executing the instruction sets of the present invention. Theprogram 42 is generated for executing two instructions with a singleinstruction set. The program shown in FIG. 7 is converted through acompiler into an execution program of the instruction sets of thepresent invention so as to be executed by the control unit 30.

FIG. 8 shows the complied program 43 having instruction sets 10 of thepresent invention. The program product 18 having such instruction sets10 is provided in the form recorded or stored in the ROM 39, RAM, oranother appropriate recording medium readable by the data processingsystem. Moreover, the program product 43 or 18 embedded in atransmission medium exchangeable in a network environment may also bedistributed. It is well understood in the programs 43 with reference tothe program 42, preparation for the execution instructions 15 of thesecond instruction set 10 is made in the Y field 12 of the firstinstruction set 10. In the first instruction set 10, the type field 16indicates that immediate is described in the Y field 12 as preparationinformation. The second execution control unit 23 decodes the Y field 12and provides the immediate to source caches or registers of the ALU 34.Therefore, by the second instruction set 10, the execution instructions15 are executed on the ALU 34 that has been ready for those executioninstructions. Namely, at the time when the second instruction set 10 isexecuted, to the registers defined in the destination field 17, theinstructions of “MOVE” in the execution instruction field 15 are simplyexecuted.

Similarly, in the Y field 12 of the second instruction set 10,instructions to set source registers are described as preparationinformation of the execution instructions “MOVE” and “ADD” in theexecution instruction field 15 of the following third instruction set10. The type field 16 defines that the registers and immediate aredescribed in the Y field 12.

In the program 43, the third and the following instruction sets 10 aredecoded as that described above. Preparation information for theexecution instructions 15 of the following fourth instruction set 10 isdescribed in the type field 16 and Y field 12 of the third instructionset 10. The execution instructions 15 of the fourth instruction set 10are comparison (CMP) and conditional branching (JCC). Accordingly, bythe type field 16 and Y field 12 of the third instruction set 10, aregister R1 to be compared in the following execution instruction 15, animmediate data of #END (#FEFFFFFFH), and an address of the branchdestination #LNEXT (#00000500H) are described as preparationinformation. Accordingly, upon executing the execution instructions 15of the fourth instruction set 10, the comparison result is obtained inthat execution cycle, because the input data have been set to thearithmetic-processing unit 34 that operates as a comparison circuit.Moreover, the jump address has been set to the fetch address register.Therefore, by the conditional branching of the execution instruction 15,another instruction set 10 at the transition address is fetched in thatexecution cycle, based on the comparison result.

By the type field 16 and Y field 12 of the fourth instruction set 10,information on registers to be compared (R0 and R1) and an address ofthe branch destination #LOOP (#00000496H) are described as preparationinformation of the execution instructions 15 of the following fifthinstruction set 10, i.e., comparison (CMP) and conditional branching(JCC). Accordingly, like the fourth instruction set, upon executing thefifth instruction set 10, the comparison and conditional branching areperformed at that execution cycle, because the interface of thearithmetic processing unit 34 has already been ready to execute the CMPand JCC described in the X field 11.

In the Y field 12 of the fifth instruction set 10, source registerinformation (R1) and an address of the transition destination #LOOP aredescribed as preparation information of the execution instructions ofthe following sixth instruction set 10, i.e., movement (MOVE) andbranching (JMP). Accordingly, when the sixth instruction set 10 isexecuted, the data item is stored in the destination register R0 as wellas another instruction is fetched from the address of the transitiondestination #LOOP in that execution cycle.

Thus, according to the instruction set of the present invention, theexecution instruction is separated from the preparation instruction thatdescribes interfaces and/or other information for executing subjectexecution instruction. Moreover, the preparation instruction isdescribed in the instruction set that is fetched prior to that executioninstruction. Accordingly, by the execution instructions described ineach instruction set, only the execution corresponding arithmeticoperation is simply or merely executed, because the data have been reador assigned to the source sides of the ALU 34. Accordingly, excellent ACcharacteristics and improved execution frequency characteristics areobtained. Moreover, like the conventional pipeline, although the timingsof operations with respect to the execution instruction are differentfrom that of the conventional pipeline, operations such as instructionfetching, register decoding, and other processings are performed in astepwise manner. Thus, the throughput is also improved.

In addition, the program of this embodiment is capable of describing twoinstructions in a single instruction set. Therefore, by parallelexecution of a plurality of instructions near the program counter likeVLIW, the processing speed becomes further improved.

Moreover, in this program 43, conditional branching is described in theexecution instruction field 15 of the fourth instruction set, and theaddress of subject branch destination is described in the Y field 12 ofthe preceding third instruction set. Accordingly, the address of thebranch destination is set to the fetch register upon or before executionof the fourth instruction set. Thus, when the branch conditions aresatisfied, the instruction set at the branch destination is fetchedand/or executed without any penalty. It is also possible to pre-fetchthe instruction at the branch destination, so that preparation forexecuting the execution instruction at the branch destination can bemade in advance. Accordingly, even the instruction at the branchdestination is executed without loss of even one clock. Thus, theprocessing is accurately defined on a clock-by-clock basis.

FIG. 9 further shows a program 44 of the present invention, whichdefines data flows using the Y field 12 of the instruction set 10 of thepresent invention for executing the same procedure described above basedon that data flows. Among the data flow designation instructions 25described in this program 44, “DFLWI” is an instruction for initializinga data flow, and “DFLWC” is an instruction defining information ofconnections (information of interfaces) and processing content(function) of the arithmetic processing unit 34 forming the data flow(data path). “DFLWS” is an instruction defining the terminationconditions of the data flow. Instruction located the end, “DFLWS” is forinputting data to the data flow thus defined and actuate the processingof the data path. These data flow designation instructions 25 aredescribed in the Y field 12 as preparation information and decoded bythe second execution control unit 33, so that the structures(configurations) for conducting the data processes are set by theprocessing units 34.

When the program 44 shown in FIG. 9 is executed, the second executioncontrol unit 33 sets, as the second control step, the input and/oroutput interfaces of the processing unit independently of the time ortiming of execution of that processing unit, as well as defines thecontents of the processing to be executed in the processing unitaccording to the specification of data flow in the program. Moreover,the second execution control unit 33 also functions as a scheduler 36 soas to manage the schedule retaining the interface of respectiveprocessing unit in the second control step.

Accordingly, as shown in FIG. 10, the second execution control unit 33functioning as scheduler 36 defines the respective interfaces(input/output) and contents or functions of the processing of threearithmetic processing units 34, and retains that states and/orconfigurations until the termination conditions are satisfied.Accordingly, through the data flow or data path configured with thesearithmetic processing units 34, the same processing as that shown inFIG. 6 proceeds in sequence independently of the program counter. Inother words, by designating the data flow, dedicated circuitry for thatprocessing is provided in the control unit 30 prior to the execution bythe three arithmetic processing units 34. Thus, the processing ofobtaining the maximum value is executed independently of the control ofthe program counter. The data flow is terminated if the ALU 34functioning as DP1.SUB judges that DP1.R1 corresponds to #END.

Thus, as is shown in FIG. 9, definition of the data flow enables thesame processing as that of the program shown in FIG. 6 or 7 withoutusing any branch instruction. Accordingly, although the control unit 30is for a general-purpose, it efficiently performs a specific processingefficiently and at an extremely high speed like a control unit havingdedicated circuitry for that specific processing.

The instruction set and the control unit according to the presentinvention make it possible to provide data flows or para-data flows forvarious processings in the control unit. These data flows can also beapplied as templates for executing other processings or programs. Thismeans that, using software, the hardware are modified at any time to theconfiguration suitable for the specific data processing, in addition,such configurations are realized by other programs or hardware. It isalso possible to set a plurality of data flows, and a multi-commandstream can be defined in the control unit by software. Thissignificantly facilitates parallel execution of a plurality ofprocessings, and programming easily controls varieties of theirexecution.

FIG. 11 is a schematic structure of a data processing system provided asa system LSI 50, having a plurality of processing units (templates)capable of defining a data flow by the instruction set 10 including theX field 11 and Y field 12 of this invention. This system LSI 50 includesa processor section 51 for conducting data processings, a code RAM 52storing a program 18 for controlling the processings in the processorregion 51, and a data RAM 53 storing other control information or dataof processing and the RAM 53 becomes a temporal work memory. Theprocessor section 51 includes a fetch unit (FU) 55 for fetching aprogram code, a general-purpose data processing unit (multi-purpose ALU,first control unit) 56 for conducting versatile processing, a data flowprocessing unit (DFU, second control unit) 57 capable of processing datain a data flow scheme.

The LSI 50 of this embodiment decodes the program code that includes aset of X field 11 and Y field 12 in the single instruction set 10 andexecutes the processing accordingly. The FU 55 includes a fetch register(FR(X)) 61 x for storing instruction in the X field 11 of the fetchedinstruction set 10, and a fetch register (FR(Y)) 61 y for storinginstruction in the Y field 12 thereof. The FU 55 further includes an Xdecoder 62 x for decoding the instruction latched in the FR(X) 61 x, anda Y decoder 62 y for decoding the instruction latched in the FR(Y) 61 y.The FU 55 further includes a register (PC) 63 for storing an address ofthe following instruction set according to the decode result of thesedecoders 62 x and 62 y, and the PC 63 functions as a program counter.The subsequent instruction set is fetched at any time from apredetermined address of the program stored in the code RAM 52.

In this LSI 50, the X decoder 62 x functions as the aforementioned firstexecution control unit 32. Therefore, the X decoder 62 x conducts thefirst control step of the present invention, based on the executioninstruction described in the X field 11 of the instruction set 10. Theydecoder 62 y functions as the second execution control unit 33.Accordingly, the Y decoder 62 y performs the second control step of thepresent invention, based on the preparation information described in theY field 12 of the instruction set 10. Therefore, in the control of thisdata processing system, in the fetch unit 55, the step of fetching theinstruction set of the present invention is performed; in the X decoder62 x, the first control step of decoding the execution instruction inthe first field and proceeding with the operation or data processing ofthat execution instruction by the processing unit that has been presetso as to be ready to execute the operation or data processing of thatexecution instruction; in the Y decoder 62 y, independently of the firstcontrol step, the second control step of decoding preparationinformation in the second field and setting the state of the processingunit so as to be ready to execute the operation or data processing.

The multi-purpose ALU 56 includes the arithmetic unit (ALU) 34 asdescribed in connection with FIG. 5 and a register group 35 for storinginput/output data of the ALU 34. Provided that the instructions decodedin the FU 55 are the execution instruction and/or preparationinformation of the ALU 34, a decode signal φx of the X decoder 62 x anda decode signal φy of the Y decoder 62 y are supplied respectively tothe multi-purpose ALU 56, so that the described processing is performedin the ALU 34 as explained above.

The DFU 57 has a template section 72 where a plurality of templates 71for configuring one of a plurality data flows or pseudo data flows forvarious processings are arranged. As described above in connection withFIGS. 9 and 10, each template 71 is the processing unit (processingcircuit) having a function as a specific data path or data flow, such asthe arithmetic-processing unit (ALU). When the Y decoder 62 y decodesthe data flow designation instructions 25 described as preparationinformation in the Y field 12, the respective interfaces and contents offunction of processing in the templates 71, i.e., the processing unitsof the DFU 57, are set based on the signal φy.

Accordingly, it is possible to change the respective connections of thetemplates 71 and processes in that templates 71 by the data flowdesignator 25 described in the Y field 12. Thus, with combination ofthese templates 71, data path(s) suitable for the specific dataprocessing is flexibly configured in the template region 72 by means ofthe program 18. Thus, dedicated circuitry for the specific processing isprovided in the processor 51, whereby the processing therein isconducted independently of the control of the program counter. In otherwords, due to the data flow designation instructions 25 that arepossible to change the respective inputs/outputs of the templates 71 andprocesses in the templates 71 by software, the hardware of the processor51 is modified or reconfigured at any time to the configuration suitablefor the specific data processing.

As shown in FIG. 12(a), in order to perform some process on the inputdata φin to getting the output data φout by the DFU 57 of this processor51, it is possible to set the respective interfaces of the templates 71by the data flow designator 25 so that the data processing is performedwith the templates 1-1, 1-2 and 1-3 being connected in series with eachother as shown in FIG. 12(b). Similarly, for the other templates 71 inthe template block 72, it is possible to set their respective interfacesso as to configure data paths or data flows with appropriatecombinations of a plurality of templates 71. Thus, a plurality ofdedicated or special processing units or dedicated data paths 73 thatare suitable for processing the input data φin are configured at anytime in the template section 72 by means of the program 18.

On the other hand, in the case where the process for performing on theinput data φin is changed, it is possible to change the connectionbetween the templates 71 by the data flow designation instructions 25,as shown in FIG. 12(c). The Y decoder 62 y decodes the data flowdesignation instructions 25 so as to change the respective interfaces ofthe corresponding templates 71. Such control process (second controlstep) of the Y decoder 62 y enables one or a plurality of data paths 73suitable for executing another different processings to be configured inthe template section 72 with the templates 1-1, 2-n and m-n beingconnected in series with each other.

In addition, the processing unit formed from single template 71 orcombination of a plurality of templates 71 can also be assigned toanother processing or another program that is executed in parallel. Inthe case where a plurality of processors 51 are connected to each otherthrough an appropriate bus, it is also possible to configure a train(data path) 73 having the templates 71 combined for another dataprocessing that is mainly performed by another processor 51, thereforeit is possible to use the data processing resources, i.e., the templates71, extremely effectively.

Moreover, unlike the FPGA intended to cover even implementation of asimple logic gate such as “AND” and “OR”, the template 71 of the presentinvention is a higher-level data processing unit including therein somespecific data path which basically has a function as ALU or other logicgates. The respective interfaces of the templates 71 are defined orredefined by the data flow designation instructions 25 so as to changethe combination of the templates 71. Thus, a larger data path suitablefor desired specific processing is configured. At the same time, theprocessing content or processing itself performed in the templates 71can also be defined by the data flow designation instructions 25changing the connection of the ALU or other logic gates or the likewithin the template 71. Namely, the processing content performed in thetemplates 71 are also defined and varied by selecting a part of theinternal data path in the template 71.

Accordingly, in the case where the hardware of the DFU 57 having aplurality of templates 71 of this example arranged therein isreconfigured for the specific data processing, re-mapping of the entirechip as in the FPGA or even re-mapping on the basis of a limited logicblock is not necessary. Instead, by switching the data paths previouslyprovided in the templates 71 or in the template section 72, or byselecting a part of the data paths, the desired data paths areimplemented using the ALUs or logic gates prepared in advance. In otherwords, within the template 71, connections of the logic gates are onlyreset or reconfigured within a minimum requirement, and even between thetemplates 71, the connections are only reset or reconfigured within aminimum required range. This enables the hardware to be changed to theconfiguration suitable for the specific data processing in a very shortor limited time, in units of clock.

Since FPGA incorporates no logic gate, they are extremely versatile.However, FPGA include a large number of wirings that are unnecessary toform logic circuitry for implementing functions of a specificapplication, and such redundancy hinders reduction in length of signalpaths. FPGA occupies a larger area than that of an ASIC that is specificto the application to be executed, and also have degraded ACcharacteristics. In contrast, the processor 51 employing the templates71 of this embodiment which incorporate appropriate logic gates inadvance is capable of preventing a huge wasteful area from beingproduced as in the FPGA, and also capable of improving the ACcharacteristics. Accordingly, the data processing unit 57 in thisembodiment based on the templates 71 is a reconfigurable processorcapable of changing the hardware by means of a program. Thus, in thisinvention, it is possible to provide the data processing system havingboth a higher-level flexibility of software and higher-speed performanceof hardware compared to a processor employing FPGAs.

Appropriate logic gates are incorporated in these templates 71previously, therefore, the logic gates required for performing thespecific application are implemented at an appropriate density.Accordingly, the data processing unit using the templates 71 iseconomical. In the case where the data processor is formed from FPGA,frequent downloading of a program for reconfiguring the logic must beconsidered in order to compensate for reduction in packaging density.The time required for such downloading also reduces the processingspeed. In contrast, since the processor 51 using the templates 71 has ahigh packaging density, the necessity of compensating for reduction thedensity is reduced, and frequent reconfiguration of the hardware is lessrequired. Moreover, reconfigurations of the hardware are controlled inthe units of clock. In these respects, it is possible to provide acompact, high-speed data processing system capable of reconfiguring thehardware by means of software that is different from the FPGA-basedreconfigurable processor.

Moreover, the DFU 57 shown in FIG. 11 includes a configuration register(CREG) 75 capable of collectively defining or setting the respectiveinterfaces and content of processings (hereinafter referred to asconfiguration data) of the templates 71 arranged in the template section72, and a configuration RAM (CRAM) 76 storing a plurality ofconfiguration data Ci (hereinafter, i represents an appropriate integer)to be set to the CREG 75. An instruction like “DFSET Ci” is provided asan instruction of the data flow designators 25. When the Y decoder 62 ydecodes this instruction, desired configuration data among theconfiguration data Ci stored in the CRAM 76 is loaded into the CREG 75.As a result, configurations of the plurality of templates 71 arranged inthe template section 72 are changed collectively. Alternatively,configuration may be changed on the basis of a processing block formedfrom a plurality of templates 71.

It is also possible to set or change the configuration of the individualtemplate 71 when the Y decoder 62 y decodes the data flow designationinstruction 25 such as DFLWI or DFLWC explained above. In addition, asmentioned above, since the DFU 57 is capable of changing, with a singleinstruction, the configurations of a plurality of templates 71 thatrequires a large amount of information, the instruction efficiency isimproved as well as the time expended for reconfiguration is reduced.

The DFU 57 further includes a controller 77 for downloading theconfiguration data into the CRAM 76 on a block-by-block basis. Inaddition, “DFLOAD BCi” is provided as an instruction of the data flowdesignator 25. When the Y decoder 62 y decodes this instruction, anumber of configuration data Ci for the ongoing processing or theprocessing that would occur in the future are previously downloaded intothe configuration memory, i.e., the CRAM 76, among a large number ofconfiguration data 78 prepared in advance in the data RAM 53 or thelike. By this structure, a small-capacity and high-speed associativememory or the like is able to be applied as the CRAM 76 and the hardwarebecomes reconfigured flexibly and further quickly.

FIG. 13 shows an example of the template 71. This template 71 is capableof exchanging the data with another template 71 through a data flow RAM(DFRAM) 79 prepared in the DFU 57. The processing result of anothertemplate 71 is input through an I/O interface 81 to input caches 82 a to82 d, and then are processed and output to output caches 83 a to 83 d.This template 71 has a data path 88 capable of performing the followingprocessing on data A, B, C and D respectively stored in the input caches82 a to 82 d, and of storing the operation result in the output cache 83b and storing the comparison result in the output cache 83 c. Theprocessing result of the template 71 is again output to another templatethrough the I/O interface 81 and DFRAM 79.

IF A=?THEN (C+B)=DELSE (C−B)=D  (A)

This template 71 has its own configuration register 84. The data storedin the register 84, in this template 71, controls a plurality ofselectors 89 so as to select a signal to be input to the logic gatessuch as control portion 85, adder 86 and comparator 87. Accordingly, bychanging the data in the configuration register 84, in the template 71,another processing using a part of the data path 88 is possible toproceed. For example, in the template 71, the following processing isalso provided without using the control portion 85.(B+C)=D(B−C)=D  (B)

Similarly, by changing the data in the configuration register 84, a partof the data path 88 can be used so that the template 71 is utilized as acondition determination circuit using the control portion 85, anaddition/subtraction circuit using the adder 86, or a comparison circuitusing the comparator 87. These logic gates are formed from dedicatedcircuitry that is incorporated in the template 71, therefore there is nowasteful parts in terms of the circuit structure and the processingtime. In addition, it is possible to change the input and output dataconfigurations to/from the template 71 by the interface 81 that iscontrolled by the configuration register 84. Thus, the template 71becomes all or a part of the data flow for performing the desired dataprocessing.

This template 71 is also capable of rewriting the data in its ownconfiguration register 84, based on either one of the data from theaforementioned CREG 75 and the data from the Y decoder (YDEC) 62 y ofthe FU 55, and selection thereof is controlled by a signal from the Ydecoder 62 y. Namely, configuration of this template 71 is controlled bythe Y decoder 62 y or the second control step performed by the Y decoder62 y, according to the data flow designation instructions 25. Therefore,both reconfiguration of hardware are possible, the one is to change thehardware configuration of the template 71, based on the DFSETinstruction or the like, together with another template(s) according tothe configuration data Ci stored in the CRAM 76; and another is toselect a part of the specific data path 88 of the template 71 by thedata in the configuration register 84 set by the data flow designationinstruction 25.

Accordingly, configuration of the templates 71 is changed by the dataflow designation instructions 25 either individually or in groups orblocks, whereby the data path of the processor 51 is flexiblyreconfigured.

The structure of the template 71 is not limited to the above embodiment.It is possible to provide appropriate types and number of templateshaving logic gates for combining, selecting a part of inner data-path,and changing the combination of the templates 71 for performing amultiplicity of data processings. More specifically, in the presentinvention, somewhat compact data paths are provided as several types oftemplates. Thus, by designating combination of the data paths, thedata-flow-type processings are implemented thereby the specificprocessings are performed in an improved performance condition. Inaddition, any processing that cannot be handled with the templates isperformed with the functions of the multi-purpose ALU 56 of theprocessor 51. Moreover, in the multi-purpose ALU 56 of this processor,the penalty generated upon branching and others, is minimize by thepreparation instructions described in the Y field 12 of the instructionset 10. Therefore, the system LSI 50 incorporating the processor 51 ofthis embodiment makes it possible to provide a high-performance LSIcapable of changing the hardware as flexibly as describing theprocessing by programs, and it is suitable for high-speed and real-timeprocessing. This LSI also flexibly deals with a change in application,specification without reduction in processing performance resulting fromthe change in specification.

In the case where the summary of the application to be executed withthis system LSI 50 is known at the time of developing or designing thesystem LSI 50, it is possible to configure the template section 72mainly with the templates having configuration suitable for theprocessing of that application. As a result, an increased number of dataprocessings can be performed with the data-flow-type processing, therebyimproving the processing performance. In the case where ageneral-purpose LSI is provided by the system LSI 50, it is possible toconfigure the template section 72 mainly with the templates suitable forthe processing that often occurs in a general-purpose application suchas floating-point operation, multiplication and division, imageprocessing or the like.

Thus, the instruction set and the data processing system according tothe present invention make it possible to provide an LSI having a dataflow or pseudo data flow performing various processings, and by using asoftware, the hardware for executing the data flow can be changed at anytime to the configuration suitable for a specific data processing.Moreover, the aforementioned architecture for conducting thedata-flow-type processing by combination of the templates, i.e., the DFU52 or template region 72, can be incorporated into the control unit orthe data processing system such as processor independently of theinstruction set 10 having the X field 11 and Y field 12. Thus, it ispossible to provide a data processing system capable of conducting theprocessing at a higher speed, changing the hardware in a shorter time,and also having better AC characteristics, as compared to the FPGA.

It is also possible to configure a system LSI that incorporates the DFU57 or template region 72 together with a conventional general-purposeembedded processor, i.e., a processor operating with mnemonic codes. Inthis case, any processing that cannot be handled with the templates 71can be conducted with the general-purpose processor. As described above,however, the conventional processor has the problems such as branchingpenalty and wasting of clocks for preparation of registers forarithmetic processing. Accordingly, it is desirable to apply theprocessor 51 of this embodiment capable of decoding the instruction set10 having the X and Y fields for execution.

Moreover, with the processor 51 and instruction set 10 of thisembodiment, configurations of the DFU 57 are set or changed beforeexecution of the data processing, in parallel with another processing bythe Y field 12. This is advantageous in terms of processing efficiencyand program efficiency. The program efficiency is also improved bydescribing a conventional mnemonic instruction code and data-flow-typeinstruction code into a single instruction set. The function of the Yfield 12 of the instruction set 10 of this embodiment is not limited todescribing the data-flow-type instruction code as explained above.

The processor according to the present invention is capable of changingphysical data path configuration or structure by the Y field 12 prior toexecution. In contrast, in the conventional processor, a plurality ofmultiprocessors are connected to each other only through a sharedmemory. Therefore, even if there is a processor in the idle state, theinternal data processing unit of that processor cannot be utilized fromthe outside. In the data processor according to the present invention,setting an appropriate data flow enables an unused hardware in theprocessor to be used by another control unit or data processor.

As secondary effects, in the control unit of the present invention andthe processor using the same, efficiency of the instruction executionsequence is improved, as well as independence and improved degree offreedom (availability) of the internal data path is ensured, therefore,the processings are successively executed as long as the executinghardware are available, even if instruction sequences for theprocessings having contexts of completely different properties aresimultaneously supplied.

Now, the advantages of the cooperative design of hardware and softwarebecomes point out flourishingly, and the combination of the instructionset and the control unit of the present invention becomes an answer tothe question how algorithms and/or data processes requested by the userare implemented in efficient and economical manner within the allowablehardware costs. For example, based on both the data and/or informationrelating to the instruction set of the present invention (the formerDAP/DNA) reflecting configurations of the data paths those are alreadyimplemented, and to the hardware and/or sequence subsequently added forexecuting the process, new type of combination that is corresponding tothe new data path (data flow) described with software, becomes mostoptimal solutions for the process and contributes for improvingperformance are led while minimizing the hardware costs.

In the conventional hardware, configuration is less likely to be dividedinto elements. Therefore, there is no flexibility in combination of theelements, and basically, the major solution for improving performance isto add a single new data path. Therefore, the conventional architectureis hard to evaluate numerically either in terms of accumulating someinformation for improving performance or of adding hardware informationactually implemented for realizing the required improved performance,thereby making it difficult to create a database. In contrast, accordingto the present invention, since compact data paths are provided astemplates and combination of the data paths is designated so as toconduct the data-flow-type processing, cooperation between hardware andsoftware becomes easily estimated in an extremely meticulous manner forimproving performance. It is also possible to accumulate trade-offinformation between hardware and software, therefore, possibility of thecombination of data paths may be connected closely to the degree ofcontribution to the processing performance. This makes it possible toaccumulate estimation data relating to he cost, the performance forrequired processes, and performance for execution those are closelyrelating to both hardware and software. In addition, since the datapaths are implemented without discontinuing execution of the mainprocessing or general-purpose processing, expected result to theaddition for the performance request is predicted from the accumulatedpast data of the hardware and instruction sets of the present invention.

Therefore, the present invention contributes not only to significantreduction in current design and specification costs, but also tocompleting the next new design with the minimum trade-off between newhardware and software to be added. Moreover, corresponding to theprocessing type, lending an internal data path to the outside isfacilitated, therefore hardware resource sharing becomes possible.Accordingly, parallel processing by a plurality of modules of thepresent invention (DAP/DNA modules) becomes one of the most usefulaspects for implementing compact hardware.

Note that the aforementioned data processing system and instruction setare one of the embodiments of this invention, such that, in the dataprocessor, it is also possible to use an external RAM or ROM instead ofthe code RAM or data RAM or the like, and to additionally provide aninterface with an external DRAM or SRAM or the like. The data processorsadditionally having known functions as a data processor such as systemLSt, e.g., an I/O interface for connection with another external device,are also included in the scope of the present invention. Accordingly,the present invention is understood and appreciated by the terms of theclaims below, and all modifications covered by the claims below fallwithin the scope of the invention.

In a new programming environment provided by the instruction set and thedata processing system of the present invention, it is possible toprovide further special instructions in addition to those describedabove. Possible examples include: “XFORK” for activating, in addition toa current program, one or more objects (programs) simultaneously andsupporting the parallel processing activation at the instruction level;“XSYNK” for synchronizing objects (programs); “XPIPE” for instructingpipeline connection between parallel processings; and “XSWITCH” forterminating a current object and activating the following object.

As has been described above, the technology including the instructionset of the present invention, programming using the instruction sets,and the data processing system capable of executing the instruction setsare based on the significantly improved principle of instruction-setstructure or configuration, therefore, the explained problems that arehard to address with the prior art are solved and significantimprovement in performance is achieved.

In this invention, the structure of instruction sets are reviewed andconstructed from a completely different standpoint of the conventionalway, thus, the instruction set of the present invention extremelyefficiently solves many problems that seem to be extremely hard to solvewith the prior art. Actually, in the prior art, the structure ofinstruction-set and the instruction supply (acquisition) method usinghardware have been implemented based on the extremely standardized,traditional preconceived ideas, thereby hindering solution of theproblems in the essential sense. The conventional attempts to solve allthe problems with the huge, complicated hardware configuration havecaused a significant increase in costs for developing the technologythat is to contribute to the society. The cost is also increased invarious information processing products configured based on thattechnology. In the present invention, the instruction set that should bethe original and gives priority to the application requirements, isprovided. Therefore, this invention provides means that is not onlycapable of improving product performance efficiency but also is morelikely to attain high development efficiency and quality assurance ofthe products.

Moreover, according to the present invention, data paths (data flows)capable of contributing to improved performance can be accumulated withthe resources, i.e., the templates and the instruction sets forutilizing the templates. Then, the accumulated data paths becomepossible to be updated at any time based on subsequently added hardwareconfiguration information and sequence information for performing thedata processing, so that the optimal solution is easily obtained.Accordingly, by the present invention, resource sharing betweenapplications, resource sharing in hardware and investment of hardwarefor improving performance, those are conventional pointed out, will beproceeded in more desirable manner, and this invention will besignificantly contributable as technology infrastructure forconstructing networked society.

INDUSTRIAL APPLICABILITY

The data processing system of the present invention is provided as aprocessor, LSI or the like capable of executing various dataprocessings, and is applicable not only to the integrated circuits ofelectronic devices, but also to the optical devices, and even to theoptical integrated circuit devices integrating electronic and opticaldevices. In particular, a control program including the instruction setof the present invention and data processor are capable of flexiblyexecuting the data processing at a high speed, and are preferable forthe processes required to have high-speed performance and real-timeperformance like the network processing and image processing.

1. A control program product comprising an instruction set including afirst field for describing an execution instruction for designatingcontent of an operation or data processing that is executed in at leastone processing unit forming a data processing system, and a second fieldfor describing preparation information for setting the processing unitto a state that is ready to execute the operation or data processingthat is executed according to the execution instruction, the preparationinformation in the second field is for the operation or data processingbeing independent of the content of the execution instruction describedin the first field of the instruction set, wherein the preparationinformation for the execution instruction described in the first fieldof a subsequent instruction set is described in the second field.
 2. Thecontrol program product of claim 1, wherein the preparation informationincludes information for designating an input and/or output interface ofthe processing unit independently of execution timing of the processingunit.
 3. The control program product of claim 1, wherein the preparationinformation includes information for designating content of processingof the processing unit.
 4. The control program product of claim 1,wherein the data processing system includes a plurality of theprocessing units, and the preparation information includes informationfor designating a combination of data paths by the processing units. 5.The control program product of to claim 1, wherein the processing unitincludes a specific internal data path, and the preparation informationincludes information for selecting a part of the internal data path. 6.The control program product of claim 1, wherein the preparationinformation includes information for designating input/output interfacesin a processing block formed from a plurality of the processing units.7. The program product of claim 6, wherein the data processing systemincludes a memory storing a plurality of configuration data defining theinput and/or output interfaces in the processing block, and thepreparation information includes information for selecting one of theplurality of configuration data stored in the memory for changing theinput and/or output interfaces in the processing block.
 8. The controlprogram product of claim 1, wherein an instruction designatinginput/output between a register or buffer and a memory is described inthe second field.
 9. The control program product of claim 1, wherein aplurality of the execution instructions and/or the preparationinformation are described in the first and/or second field respectively.10. A control program product comprising an instruction set including afirst field for describing an execution instruction for designatingcontent of an operation or data processing that is executed in at leastone processing unit forming a data processing system, and a second fieldfor describing preparation information for setting the processing unitto a state that is ready to execute the operation or data processingthat is executed according to the execution instruction, the preparationinformation in the second field is for the operation or data processingbeing independent of the content of the execution instruction describedin the first field of the instruction set, wherein the data processingsystem has a first control unit including an arithmetic/logic unit asthe processing unit, and a second control unit including as theprocessing units a plurality of data flow processing units including aspecific internal data path, and the control program product includesthe instruction set in which the execution instruction for operating thearithmetic/logic unit is described in the first field, and thepreparation information designating interfaces of the arithmetic/logicunit and/or the data flow processing units is described in the secondfield.
 11. The control program product of claim 10, wherein thepreparation information includes information for designating acombination of data paths by the data flow processing units.
 12. Thecontrol program product of claim 10, wherein the preparation informationincludes information for selecting a part of the internal data path. 13.The control program product of claim 10, wherein an instructiondesignating input/output between a register or buffer and a memory isdescribed in the second field.
 14. The control program product of claim10, wherein a plurality of the execution instructions and/or thepreparation information are described in the first and/or second fieldrespectively.
 15. A recording medium recording thereon a control programcomprising an instruction set including: a first field for describing anexecution instruction for designating content of an operation or dataprocessing that is executed in at least one processing unit forming adata processing system; a second field for describing preparationinformation for setting the processing unit to a state that is ready toexecute the operation or data processing that is executed according tothe execution instruction, the preparation information in the secondfield being for the operation or data processing that is independent ofthe content of the execution instruction described in the first field ofthe instruction set; and a third field for indicating, independently ofthe first field, valid/invalid of the second field and a type of thepreparation information.
 16. A transmission medium having embeddedtherein a control program comprising an instruction set including; afirst field for describing an execution instruction for designatingcontent of an operation or data processing that is executed in at leastone processing unit forming a data processing system; a second field fordescribing preparation information for setting the processing unit to astate that is ready to execute the operation or data processing that isexecuted according to the execution instruction, the preparationinformation in the second field being for the operation or dataprocessing that is independent of the contents of the executioninstruction described in the first field of the instruction set; and athird field for indicating, independently of the first field,valid/invalid of the second field and a type of the preparationinformation.
 17. A data processing system, comprising: at least oneprocessing unit for executing an operation or data processing; a unitfor fetching an instruction set including a first field for describingan execution instruction for designating content of the operation ordata processing that is executed in the processing unit, and a secondfield for describing preparation information for setting the processingunit to a state that is ready to execute the operation or dataprocessing that is executed according to the execution instruction; afirst execution control unit for decoding the execution instruction inthe first field and proceeding with the operation or data processing bythe processing unit that is preset so as to be ready to execute theoperation or data processing of the execution instruction; and a secondexecution control unit for decoding the preparation information in thesecond field and, independently of content of the proceeding of thefirst execution control unit, setting a state of the processing unit soas to be ready to execute an operation or data processing.
 18. The dataprocessing system of claim 17, wherein the first or second executioncontrol unit includes a plurality of execution control portions forindependently processing a plurality of independent executioninstructions or preparation information that are described in the firstor second field respectively.
 19. The data processing system of claim17, wherein the second execution control unit sets an input and/oroutput interface of the processing unit independently of executiontiming of the processing unit.
 20. The data processing system of claim17, wherein the second execution control unit defines content ofprocessing of the processing unit.
 21. The data processing system ofclaim 17, comprising a plurality of the processing units, wherein thesecond execution control unit controls a combination of data paths bythe processing units.
 22. The data processing system of claim 17,wherein the processing unit includes a specific internal data path. 23.The data processing system of claim 17, wherein the processing unitincludes at least one logic gate and an internal data path connectingthe logic gate with an input/output interface.
 24. The data processingsystem of claim 22, wherein the second execution control unit selects apart of the internal data path of the processing unit according to thepreparation information.
 25. The data processing system of claim 17,wherein the second execution control unit changes input and/or outputinterfaces in a processing block formed from a plurality of theprocessing units, according to the preparation information.
 26. The dataprocessing system of claim 25, comprising a memory storing a pluralityof configuration data defining the input and/or output interfaces in theprocessing block, wherein the second execution control unit changes theinput and/or output interfaces in the processing block by selecting oneof the plurality of configuration data stored in the memory according tothe preparation information.
 27. The data processing system of claim 17,wherein the second execution control unit has a function as a schedulerfor managing an interface of the processing unit.
 28. The dataprocessing system of claim 17, further comprising a first control unitincluding an arithmetic/logic unit as the processing unit, and a secondcontrol unit having as the processing units a plurality of data flowprocessing units including a specific data path, wherein the firstexecution control unit operates the arithmetic/logic unit, and thesecond execution control unit sets interfaces of the arithmetic/logicunit and/or the data flow processing units.
 29. The data processingsystem of claim 28, wherein the second execution control unit controls acombination of data paths by the data flow processing units.
 30. Thedata processing system of claim 28, wherein the data flow processingunit has a specific internal data path, and the second execution controlunit selects a part of the internal data path of the data flowprocessing unit according to the preparation information.
 31. The dataprocessing system of claim 17, wherein the second execution control unithas a function to control input/output between a register or buffer anda memory.
 32. A method for controlling a data processing systemincluding at least one processing unit for executing an operation ordata processing comprising: a step of fetching an instruction setincluding a first field for describing an execution instruction fordesignating content of the operation or data processing that is executedin the processing unit, and a second field for describing preparationinformation for setting the processing unit to a state that is ready toexecute the operation or data processing that is executed according tothe execution instruction; a first control step of decoding theexecution instruction in the first field and proceeding with theoperation or data processing by the processing unit that is preset so asto be ready to execute the operation or data processing of the executioninstruction; and a second control step of decoding independently of thefirst control step, the preparation information in the second field andsetting a state of the processing unit so as to be ready to execute theoperation or data processing.
 33. The method of claim 32, wherein in thesecond control step, an input and/or output interface of the processingunit is set independently of execution timing of the processing unit.34. The method of claim 32, wherein in the second control step, contentof processing of the processing unit is defined.
 35. The method of claim32, wherein the data processing system includes a plurality of theprocessing units, and in the second control step, a combination of datapaths by the processing units is controlled.
 36. The method of claim 32,wherein the processing unit has a specific internal data path, and inthe second control step, a part of the internal data path of theprocessing unit is selected.
 37. The method of claim 31, wherein in thesecond control step, input and/or output interfaces in a processingblock formed from a plurality of the processing units is changed. 38.The method of claim 32, wherein the data processing system includes amemory storing a plurality of configuration data defining the inputand/or output interfaces in the processing block, and in the secondcontrol step, the input and/or output interfaces in the processing blockare changed by selecting one of the plurality of configuration datastored in the memory.
 39. The method of claim 32, wherein in the secondcontrol step, a schedule retaining an interface of the processing unitis managed.
 40. The method of claim 32, wherein in the second controlstep, input/output between a register or buffer and a memory iscontrolled.
 41. A control program product comprising an instruction setincluding: a first field for describing an execution instruction fordesignating content of an operation or data processing that is executedin at least one processing unit forming a data processing system; asecond field for describing preparation information for setting theprocessing unit to a state that is ready to execute the operation ordata processing that is executed according to the execution instruction,the preparation information in the second field being for the operationor data processing that is independent of the content of the executioninstruction described in the first field of the instruction set; and athird field for indicating independently of the first field,valid/invalid of the second field and a type of the preparationinformation.
 42. The control program product of claim 41, wherein thepreparation information for the execution instruction described in thefirst field of a subsequent instruction set is described in the secondfield.
 43. The control program product of claim 41, wherein thepreparation information includes information for designating an inputand/or output interface of the processing unit independently ofexecution timing of the processing unit.
 44. The control program productof claim 41, wherein the preparation information includes informationfor designating content of processing of the processing unit.
 45. Thecontrol program product of claim 41, wherein the data processing systemincludes a plurality of the processing units, and the preparationinformation includes information for designating a combination of datapaths by the processing units.
 46. The control program product of toclaim 41, wherein the processing unit includes a specific internal datapath, and the preparation information includes information for selectinga part of the internal data path.
 47. The control program product ofclaim 41, wherein the preparation information includes information fordesignating input/output interfaces in a processing block formed from aplurality of the processing units.
 48. The program product of claim 47,wherein the data processing system includes a memory storing a pluralityof configuration data defining the input and/or output interfaces in theprocessing block, and the preparation information includes informationfor selecting one of the plurality of configuration data stored in thememory for changing the input and/or output interfaces in the processingblock.
 49. The control program product of claim 41, wherein the dataprocessing system has a first control unit including an arithmetic/logicunit as the processing unit, and a second control unit including as theprocessing units a plurality of data flow processing units including aspecific internal data path, and the control program product includesthe instruction set in which the execution instruction for operating thearithmetic/logic unit is described in the first field, and thepreparation information designating interfaces of the arithmetic/logicunit and/or the data flow processing units is described in the secondfield.
 50. The control program product of claim 49, wherein thepreparation information includes information for designating acombination of data paths by the data flow processing units.
 51. Thecontrol program product of claim 49, wherein the preparation informationincludes information for selecting a part of the internal data path. 52.The control program product of claim 41, wherein an instructiondesignating input/output between a register or buffer and a memory isdescribed in the second field.
 53. The control program product of claim41, wherein a plurality of the execution instructions and/or thepreparation information are described in the first and/or second fieldrespectively.
 54. A data processing system, comprising: at least oneprocessing unit for executing an operation or data processing; a unitfor fetching an instruction set including a first field for describingan execution instruction for designating content of the operation ordata processing that is executed in the processing unit, a second fieldfor describing preparation information for setting the processing unitto a state that is ready to execute the operation or data processingthat is executed according to the execution instruction, and a thirdfield for indicating, independently of the first field, valid/invalid ofthe second field and a type of the preparation information; a firstexecution control unit for decoding the execution instruction in thefirst field and proceeding with the operation or data processing by theprocessing unit that is preset so as to be ready to execute theoperation or data processing of the execution instruction; and a secondexecution control unit for decoding the preparation information in thesecond field based on information in the third field and, independentlyof content of the proceeding of the first execution control unit,setting a state of the processing unit so as to be ready to execute anoperation or data processing.
 55. A method for controlling a dataprocessing system including at least one processing unit for executingan operation or data processing, comprising: a step of fetching aninstruction set including a first field for describing an executioninstruction for designating content of the operation or data processingthat is executed in the processing unit, a second field for describingpreparation information for setting the processing unit to a state thatis ready to execute the operation or data processing that is executedaccording to the execution instruction, and a third field forindicating, independently of the first field, valid/invalid of thesecond field and a type of the preparation information; a first controlstep of decoding the execution instruction in the first field andproceeding with the operation or data processing by the processing unitthat is preset so as to be ready to execute the operation or dataprocessing of the execution instruction; and a second control step ofdecoding, independently of the first control step, the preparationinformation in the second field based on information in the third fieldand setting a state of the processing unit so as to be ready to executethe operation or data processing.